The  person  charging  this  material  is  re- 
sponsible for  its  return  to  the  library  from 
which  it  was  withdrawn  on  or  before  the 
Latest  Date  stamped  below. 

Theft,    mutilation,    and    underlining    of    books    are    reasons 
for    disciplinary    action    and    may    result    in    dismissal    from 
the  University. 
To  renew  call  Telephone  Center,  333-8400 

UNIVERSITY    OF    ILLINOIS    LIBRARY    AT    URBANA-CHAMPAIGN 


DEC    6  KM 


L161— O-1096 


Digitized  by  the  Internet  Archive 
in  2013 


http://archive.org/details/predictinggradua925nels 


UIUCDCS-R-78-925 


V 


June  1978 


yyuxi 


UILU-ENG  78  1717 


PREDICTING  GRADUATE  SCHOOL  SUCCESS 
by 
Richard  H.  Nelson 


DEPARTMENT  OF  COMPUTER  SCIENCE 
UNIVERSITY  OF  ILLINOIS  AT  URBANA-CHAMPAIGN 


URBANA,   ILLINOIS 


UIUCDCS-R-78-925 


PREDICTING  GRADUATE  SCHOOL  SUCCESS 


by 


Richard  H.  Nelson 


Spring  1978 


Submitted  in  partial  fulfillment  of  the  requirements  for  the  degree  of 
Master  of  Computer  Science  in  the  Graduate  College  of  the  University  of 
Illinois  at  Urbana-Champaign,  1978 


TABLE  OF  CONTENTS 

Page 
INTRODUCTION 1 

STATISTICAL  OVERVIEW 3 

METHOD 7 

RESULTS  AND  DISCUSSION 11 

SUMMARY  AND  RECOMMENDATIONS 24 

APPENDICES 29 


INTRODUCTION 

The  purpose  of  this  research  was  to  try  to  identify  the  variables 
affecting  "success"  in  graduate  school  in  computer  science.   The  motivation 
behind  the  study  was  the  desire  to  obtain  an  objective  criterion  for  pre- 
dicting an  individual's  chances  of  success.   Such  a  criterion  would  be  a 
valuable  tool  in  facilitating  the  process  of  graduate  admissions.   Evalua- 
tions would  be  standard,  result  in  better  prediction,  and  make  selections 
simpler  and  faster. 

The  subjects  for  the  study  were  entering  graduate  students  to  the 
Department  of  Computer  Science  at  the  University  of  Illinois  from  spring 
1974  to  fall  1975.   In  all  there  were  170  entering  graduate  students  but 
adequate  data  existed  for  only  73.    These  73  form  the  sample  under 
investigation. 

Before  attempting  to  predict  success  in  graduate  school,  it  was 
first  necessary  to  form  an  operational  definition  of  what  constitutes 
graduate  school  success.   There  are  several  measurable  achievement  criteria, 
two  of  which  are  first  semester  grade  point  average  (GPA)  and  cumulative 
graduate  GPA.   A  third  available  measure  was  the  number  of  times  the 
Department's  general  exams  had  to  be  taken  by  each  student.   Data  were 
collected  on  each  of  these  measures  and  multiple  regression  analyses  were 
performed. 


Although  the  research  was  exploratory,  it  was  felt  that  GRE  scores 
(particularly  the  GRE  quantitative  score)  and  undergraduate  GPA  were 
important  enough  variables  to  warrant  the  exclusion  of  any  case  lacking 
data  on  either  one.   As  it  turned  out,  GPA  could  usually  be  estimated 
from  the  transcript,  and  thus  the  vast  majority  of  excluded  cases  were 
lacking,  at  the  very  least,  GRE  scores. 


-1- 


But  these  were  not  the  main  foci  of  the  study.   Although  each 
of  these  variables  was  easily  measurable  and  achievement-oriented,  it 
was  not  felt  that  any  one  of  them  truly  captured  what  is  meant  by  a 
"successful"  graduate  school  career.   A  student  obtaining  a  GPA  of  5.00 
the  first  semester  but  then  leaving  graduate  school  cannot  be  considered 
successful.   Similarly,  a  student  who  passes  each  general  exam  on  the 
first  attempt  but  drops  out  of  school  after  that  semester  would  not 
ordinarily  be  considered  successful.   Actually,  the  achievement  of 
obtaining  a  graduate  degree  is  ultimately  the  best  measure  of  graduate 
school  success,  and  a  student  who  earns  a  Ph.D.  is  ordinarily  thought 
to  be  more  successful  than  one  who  earns  a  master's  degree.   This 
definition  of  success  is  probably  one  with  which  most  department  heads 
and  admissions'  committees  could  be  comfortable.   This  degree  criterion 
is  thus  the  major  dependent  variable  of  this  research.   Since  several 
years  had  elapsed  since  the  students  originally  entered  graduate  school, 
it  was  possible  to  divide  the  sample  into  three  groups:   one  consisting 
of  those  students  who  dropped  out  of  the  department  before  receiving 

any  graduate  degree;  a  second  comprising  those  who  received  masters' 

2 
degrees  and  then  departed;   and  a  third  group  consisting  of  those  who 

passed  the  department's  qualifying  exam,  a  necessary  precondition  for 

study  leading  to  the  Ph.D.   All  members  of  the  second  group  either  did 

not  take  the  qualifying  exam  or  failed  it,  and  thus  the  achievement  of 


2 
No  distinction  was  made  between  the  different  master's  programs  since, 

of  the  28  students  in  the  master's  group,  only  one  received  an  M.C.S. 

and  one  an  M.S.T.C.S. 


-2- 


passing  the  qualifying  exam  is  taken  as  an  indication  of  eventual 

3 
completion  of  the  Ph.D. 

The  statistical  technique  of  discriminant  analysis  was  used 
to  determine  the  dimensions  along  which  the  three  groups  differ.   The 
results  of  this  analysis  form  the  first  portion  of  the  following 
discussion.   The  strength  of  the  relationship  obtained  is  then  discussed 
by  viewing  discriminant  analysis  as  a  special  case  of  canonical  correlation, 
Finally,  the  results  of  the  discriminant  analysis  —  the  predictor 
variables  and  the  goodness  of  the  prediction  —  are  compared  to  the 
results  of  the  multiple  regressions  mentioned  above.   This  is  done 
to  ascertain  whether  the  two  procedures  do  indeed  yield  different  results 
and  thus,  to  what  extent  the  degree  criterion  proposed  differs  from 
measures  such  as  GPA. 

But  before  proceeding  to  the  details  of  the  analysis,  a  brief 
discussion  is  offered  of  the  major  statistical  techniques  employed.   A 
more  extended  treatment  can  be  found  in  Tatsuoka  (1971). 

STATISTICAL  OVERVIEW: 

Multiple  regression  is  the  statistical  technique  whose  goal 
is  to  form  a  linear  combination  of  the  independent  variables, 


(X,,  ...,  X) 
1        P 


Y  =  b  +  b-X.  +  b0X„  +  ...  +  b  X 
o    11    2  2  p  p 


3 

The  original  research  design  called  for  studying  a  class  of  students  who 

had  all  ended  their  graduate  school  careers.   The  poor  quality  of  the  data 
available  from  administrative  sources  and  the  fact  that  the  department's 
computerized  records  only  date  back  to  1974  made  this  infeasible. 


-3- 


such  that  the  correlation  (r  „)  between  this  linear  combination  and  the 

yy 

dependent  variable  (Y)  is  maximal.  An  equivalent  formulation  in  terms  of 
least  squares  gives  the  goal  of  multiple  regression  as  minimizing  the  sum 
of  squared  discrepancies,  i.e.,  finding  a  linear  combination  of  the  X's 

N       *   2 
such  that   ^  (Y.  -  Y.)   =  minimum,  where  N  =  number  of  observations. 

i=l   1    X 
The  solution  is  obtained  by  taking  the  partial  derivative  of  the  expression 

N       .   2     N 

E  (Y.  -  Y.)   =   E  (Y.  -  b  -b.IJ1-...-bX.    with  respect  to  each 
.,i    i      .  ,   i    o    lil         pip 
i=l  i=l  r  r 

of  the  variables  b  ,  b, ,  . . .  ,  b  ,  setting  each  derivative  to  zero,  and 
o   1         p 

solving  the  resulting  set  of  linear  equations  for  b  ,  ...  ,  b  .   The 

equation  can  then  be  used  to  predict  Y  (say  GPA)  for  incoming  students. 

Discriminant  analysis  is  the  appropriate  statistical  technique 
when  the  sample  is  divided  into  two  or  more  clearly-defined  groups,  and 
the  object  of  the  research  is  to  determine  how  the  groups  differ  with 
respect  to  the  p  indendent  variables  in  the  system.   The  technique  is 
conceptually  similar  to  multiple  regression,  but  the  dependent  variable 
is  now  categorical  rather  than  continuous.   The  goal  of  discrminant 
analysis  is  the  formation  of  a  linear  combination  of  the  independent 
variables  which  maximally  differentiates  between  the  K  groups.   The 
criterion  of  differentiation  is  the  ratio  of  the  between-groups  to  the 
within-groups  sums  of  squares  on  the  new  variable  Y  (the  linear  combination 
of  the  independent  variables).   The  between-groups  sums  of  squares 

K    -    -2 
(SS,(Y))  equals    E  n,(Y,  -  Y)  ,  where  n,  is  the  number  of  subjects 

k=l 


-4- 


in  group  k,  Y  is  the  mean  of  Y  for  group  k,  and  Y  is  the  grand  mean  of 

all  the  subjects  on  Y.   The  within-groups  sums  of  squares  (SS  (Y))  can 

w 

K   nk       _  ? 
be  expressed  as  SS,  (Y)  +  SS_(Y)  +  ...  +  SS__(Y)  =   Z   E  (Y   -  Y,  )  , 

k=l  i=l 
where  Y   is  the  score  of  the  ith  individual  in  the  kth  group  on  variable  Y. 

Solution  of  the  problem  is  readily  obtained  upon  solving  the  eigenvector- 
eigenvalue  problem  (W   JJ,  -  AI)v  =  0,  where 

_W  =  the  within-groups  sums  of  squares  and  cross  products  (SSCP)  matrix 

JB^  =  the  between-groups  SSCP  matrix 

X  =  the  vector  of  combining  weights  for  the  p  independent  variables 

A  =  SS^CY)  /  SS  (Y). 

D  W 

In  general,  solution  of  the  eigenvalue  problem  results  in  the  identification 
of  minimum(p,  K-l)  =  r  eigenvalues  and  associated  eigenvectors.   So  in 
general,  discriminant  analysis  produces  r  discriminant  functions  such  that 
the  first  function  is  that  linear  combination  of  the  independent  variables 
which  yields  the  largest  discriminant  criterion  A  .   The  second  function 

corresponds  to  A~,  the  largest  discriminant  criterion  of  any  linear  combination 

of  the  independent  variables  which  is  uncorrelated  with  the  first  linear 
combination.   Similarly,  the  rth  discriminant  function  yields  A  ,  the 

largest  discriminant  criterion  of  any  linear  combination  of  the  variables 
which  is  uncorrelated  with  the  previous  r-1  discriminant  functions. 
Variables  which  make  large  contributions  to  between-group  discrimination 


-5- 


are  those  which  would  be  most  useful  in  selecting  prospective 
graduate  students. 

Canonical  correlational  analysis  is  applicable  when  dealing 
with  two  sets  of  variables  (e.g.,  q  dependent  as  well  as  p  independent 
variables) .   The  goal  is  to  form  linear  combinations  of  each  of  the 
two  sets  such  that  the  correlation  between  the  linear  combinations 
is  a  maximum.   Determining  the  seets  of  coefficients  is  accomplished, 
after  appropriate  manipulations,  by  solving  the  eigen  equation 

(S  _1S   S  _1S   -  y2i)u  =  0 
—xx  ~xy~yy  ~-yx     ~  ~~   ~ 

where  S   =  the  SSCP  matrix  for  the  X  set  of  variables 
—xx 

S   =  the  SSCP  matrix  for  the  Y  set  of  variables 

~~yy 

S   =  the  p  x  q  cross  products  matrix  between  the  X  and  Y  sets 
~~xy 

of  variables 

S   =  the  q  x  p  cross  products  matrix  between  the  Y  and  X  sets 
of  variables 

M   =  the  correlation  between  the  linear  combinations 

u   =  the  vector  of  coefficients  for  the  X  set. 
In  general,  solution  of  the  eigen  equation  yields  minimum(p,  q)  =  r 
pairs  of  u  and  v  vectors.   Each  pair  of  linear  combinations  is  called 
a  canonical  variate.   The  interpretation  is  that  the  first  canonical 
variate  is  that  pair  of  linear  combinations  of  the  two  sets  of  variables 
which  is  maximally  correlated.   The  second  canonical  variate  produces 
the  maximum  correlation  between  linear  combinations  which  are  uncorrelated 
with  the  first  variate.   And  the  rth  canonical  variate  yields  the  maximum 


-6- 


correlation  between  any  pair  of  linear  combinations  uncorrelated  with 
the  first  r-1  canonical  variates. 

METHOD : 

The  independent  and  dependent  variables  for  which  data  were 
collected  are  listed  in  Table  1.   Where  appropriate,  an  explanation  of 
the  variable,  scaling  technique,  or  coding  method  is  included.   The 
actual  raw  data  values  are  given  in  Appendix  1. 

With  a  total  sample  size  of  only  73,  it  was  necessary  to  find 
some  way  to  limit  the  number  of  independent  variables  included  in  the 
discriminant  analysis.   Although  inclusion  of  all  ten  variables  would 
maximize  the  discriminating  power  in  the  given  sample,  the  coefficients 
of  the  resulting  discriminant  functions  would  be  highly  related  to  the 
sampling  errors  present  since  the  greater  the  number  of  variables  relative 
to  sample  size,  the  greater  the  effect  of  sampling  error  and  the 
higher  the  discrepancies  between  the  computed  coefficients  and  those 
which  would  be  obtained  from  a  different  sample.   Since  the  goal  of 
this  study  was  not  to  maximize  the  variability  explained  for  the  given 
sample,  but  to  obtain  reasonable  prediction  for  future  samples,  it 
was  thought  that  a  moderate  number  of  independent  variables,  relative 
to  the  sample  size,  should  be  selected.   Due  to  the  exploratory  nature 
of  the  research,  there  were  no  a  priori  grounds  upon  which  to  select 
a  subset  of  the  variables.   The  procedure  followed  was  to  first  run 
a  stepwise  discriminant  analysis  which  utilized  a  forward  selection 
technique,  i.e.,  variables  were  included  in  the  analysis  one-by-one 


-7- 


Table  1 


Variable 

Independent : 

undergraduate  GPA 

GRE  verbal  percentage 

GRE  quantitative  percentage 

average  score  over  three 
letters  of  reference 

4 
Astin  selectivity  index 

of  undergraduate 

college  attended 


undergraduate  major 


highest  prior  degree 
obtained 

//  of  C.S.  courses  taken 
as  an  undergraduate 

GPA  in  C.S.  courses 

interaction  between 
verbal  GRE  score 
and  whether  student 
was  foreign 

Dependent: 

1st  semester  graduate  GPA 

cumulative  graduate  GPA 

number  of  times  the 
general  exams  were 
taken 


Explanation 


Scale  of  1.00  to  5.00 


Scale  of  0  to  4  with  0  =  poor  and  4  =  superior 

Scale  of  1  to  7  —  the  index  is  based  on  the  average 
ACT  and  SAT  scores  of  entering  freshman  at  the 
college  —  it  thus  measures  the  general  ability 
level  of  the  students  and  the  selectivity  level 
of  the  college  (see  Appendix  2  for  sample 
Astin  scores) 

0  =  computer  science 

1  =  CS-related  field  (e.g.  engineering  or  math) 

2  =  field  unrelated  to  computer  science 

0  =  bachelor's 

1  =  master's 


foreign  =  0  if  no,  1  if  yes 
interaction  =  foreign  *  GRE  verbal 


If  student  dropped  out  prior  to  end  of  first 
semester,  GPA  =2.00 


To  obtain  an  M.S.,  general  exams  must  be  passed  in 
four  substantive  areas.  Each  exam  can  be  taken 
a  maximum  of  3  times.  A  score  of  4  was  assigned 
to  an  exam  which  was  failed  3  times  or  which  had 
not  been  passed  when  the  student  left  school. 
The  range  of  scores  was  thus  4  to  16:  4  if  each 
test  was  passed  the  first  time,  and  16  when  each 
test  was  never  passed. 


See  Appendices  3  and  4  for  correlation  matrices  of  independent  and  independent  vs. 
dependent  variables,  respectively. 


No  Astin  rating  was  available  for  foreign  institutions.   To  obtain  estimated  scores, 
the  Astin  index  was  first  stepwise-regressed  on  the  other  independent  variables  and 
this  equation  was  applied  to  the  16  foreign  students.   Eor  a  =  .05,  there  were 
three  significant  predictors  and  a  multiple  R  of  .50468.   The  equation  was: 

Astin  =  2.234  -  1. 147*largest  degr  -  1. 165*undergrad  GPA  +  .0874*GRE  quant 

-8- 


in  such  a  way  that  the  ith  variable  selected  was  that  variable  which 
added  the  most  to  the  discriminating  power  given  by  the  first  (i-1) 
variables.   Given  the  order  of  variable  selection,  analyses  were  then 
run  using  the  "best"  i  variables,  3  <_  i  <_  10,  and  appropriate  statistics 
were  collected  which  reflected  the  "goodness"  of  the  discriminating  power 
for  that  particular  number  of  independent  variables.   Finally,  the  aim 
was  to  select  for  further  analysis  the  smallest  number  of  variables  while 
retaining  as  much  predictive  power  as  possible. 

From  this  procedure  it  was  decided  to  retain  the  best  six 
variables  for  inclusion  in  the  following  analyses  and  discussion.   There 
are  several  justifications  for  this  decision.   First,  it  is  seen  from 
Table  2  that  when  using  six  variables,  the  maximum  percentage  for 
correctly  classifying  cases  into  groups  is  reached.   Second,  the 
obtained  canonical  correlations  of  .548  and  .431  between  group  membership 
and  the  independent  variables  are  reasonably  close  to  the  maximum 
achieved  using  all  the  variables.   Finally,  although  the  correlations 
obtained  using  four  or  five  variables  are  relatively  close  to  those 
for  six  variables,  there  are  theoretical  reasons  for  retaining  all  six 
variables.   It  is  seen  that  the  interaction  term  between  verbal  GRE 
scores  and  foreign  student  classification  was  the  sixth  variable  selected. 
Exclusion  of  this  variable  from  the  analysis  befuddles  any  interpretation 
of  the  effect  of  GRE  verbal  scores,  since  the  sample  correlation  between 
verbal  GRE  and  foreign  student  classification  was  -.79.   Given  that 
verbal  GRE  is  to  be  included  in  the  analysis  (it  was  stepwise  selected 


-9- 


Table  2 


Summary  of  stepwise  discriminant  analyses 


with  best  3  variables 
with  best  4  variables 
with  best  5  variables 

with  best  6  variables 

with  best  7  variables 
with  best  8  variables 
with  best  9  variables 
with  all  10  variables 


Canonical 
correlations 

%  of  cases 
correctly  classified 

.479, 

.291 

50.68% 

.486, 

.411 

60.27% 

.511, 

.430 

60.27% 

.548, 

.431 

64.38% 

.557, 

.457 

63.01% 

.561, 

.465 

63.01% 

.565, 

.480 

60.27% 

.566, 

.483 

64.38% 

Order  of  inclusion: 

1.  Astin  selectivity  index 

2.  letters  of  reference 

3.  GRE  verbal 

4.  GRE  quantitative 

5.  undergraduate  major 

6.  verbal  GRE-foreign  student  interaction 

7.  highest  prior  degree 

8.  undergraduate  GPA 

9.  GPA  in  undergraduate  CS  courses 

10.  number  of  undergraduate  CS  courses  taken 


-10- 


third),  its  interaction  with  the  foreign  classification  should  be 
included  also.   Reducing  the  effect  of  sampling  errors  (as  manifested  by 
the  percentage  of  foreign  students  in  the  given  sample)  also  argues 
for  inclusion  of  the  interaction  term.   For  all  the  above  reasons,  the 
following  results  are  based  on  the  effects  of  six  independent  variables. 

RESULTS  AND  DISCUSSION: 

Discriminant  Analysis:   With  K=3  groups  and  p=6  variables, 
two  discriminant  functions  are  obtained.   Their  significance  can  be 
seen  from  the  following:   Given  that  X   =  .42921  and  X.  =  .22755, 

Bartlett's  V  is  seen  to  be 

V  =  (N-l-(p+K)/2)Zln(l+X  ) 

=  24.106  +  13.839 
=  37.945 

which  is  approximately  distributed  as  a  chi-square  with  p(K-l)  =  12 

degrees  of  freedom  provided  N-l-(p+K)/2  is  large  (Tatsuoka,  p.  164). 

2 
Since  X-.o   nm  "  32.909  the  difference  in  group  centroids  is  highly 

significant.   To  see  that  both  functions  are  significant  it  is  only 

necessary  to  look  at  V2  =  V  -  VI  =  13.839  with  (p-1) (K-2)  =  5  degrees 

2  2 

of  freedom.   Since  12.833  =  Xc   n2<:  <  v2  <  X5  oi  =  15,0S6  the  sec°iui 

discriminant  function  is  significant  at  the  .025  level. 

In  attempting  to  interpret  the  functions  and  the  corresponding 


-11- 


dimensions  along  which  the  groups  differ,  first  consider  the  plot  of 
cases  and  group  means  on  each  function  given  in  Table  3.   It  is  seen 
that  on  function  one  the  means  of  groups  1  (dropout)  and  2  (master's)  are 
nearly  identical  while  that  of  group  3  (Ph.D.)  is  considerably  less;  whereas 
on  function  two  the  groups  are  spaced  at  roughly  equal  intervals  with  group 
2  having  the  highest  mean  and  group  1  the  lowest.   To  determine  the  relative 
contribution  of  each  original  variable  to  each  discriminant  dimension  it 
is  necessary  to  look  at  the  standardized  discriminant  weights  given  in 
Table  4. 

Table  4 

Standardized  Discriminant  Weights 

Function  1  Function  2 

Astin  index  -.30133  .99178 

letters  of  recommendation  -.38304  .31091 

GRE  verbal  .37866  .38702 

GRE  quantitative  -.38939  -.57703 

undergraduate  major  .40701  .24461 

verbal-foreign  interaction  -.39696  -.05164 

These  coefficients  allow  for  no  simple  labelling  of  the  two 
dimensions.   On  the  first  dimension,  each  origianl  variable  contributes 
roughly  equally.   Thus,  a  low  score  on  dimension  one  (e.g.  the  Ph.D. 
group  mean)  is  associated  with  a  high  Astin  index,  high  letters  of 
recommendation,  high  GRE  quantitative,  undergraduate  major  in  the 
direction  of  computer  science,  low  GRE  verbal,  and  high  GRE  verbal- 
foreign  student  interaction.   The  first  four  characteristics  are  what 
one  would  expect  of  the  Ph.D.  group.   The  low  verbal  GRE  scores  and 
high  verbal  GRE-foreign  interaction  indicate  a  higher  proportion  of 


-12- 


I  i 


0) 


1                 1 
u                to 

1 

1 

o                 o                 © 

-»                       K) 

•                          • 

* 

*                         •                        • 

♦                       • 

C                         N, 

U'. 

-J                             O                             «vl 

tn                       K> 

a                 en 

o 

en                                         tn 

O                      en 

o                 o 

o 

o                                    o 

o                 o 

HHHH  hmhhhhhhhhhhhhhmhhhhmHhhhhmhhhhhhhhhhhmhi> 

I 

tO 
•       ♦ 

tO 

LP 
O 


I 

o 
•   ♦ 

en 

o 


o 
-J 

t-n 
O 


U) 


to 


U) 


u> 


U) 
to 


u> 


wg> 


Co  VyU)  Cu 

U>  -»  CO 

(-O  tO 

Co 

— «  lo  to 

toCO     oj 


UJ 


Ui 


h0 


NJ 


to 


NJ 


KJ 


Kj 


to 


su 


to 


to 


Co 


to 


to 


tO      NJ 


KJ 


O 


P 


CO 

Q 

J-3 

3. 

fr 

R 

»-» 

6* 

n> 

1 

oj 

rt 

w 

Q 

R 

n> 

CO 

to 


NJ 


.  to 


to 

•  ♦ 

N) 

en 

o 


to 


-13- 


foreign  students  in  the  Ph.D.  group  than  the  other  two  groups.  This 
is  seen  to  be  the  actual  case,  as  the  Ph.D.  group  is  composed  of  39% 
foreign  students,  the  master's  group  11%,  and  the  dropout  group  23%. 

Given  the  interpretation  of  the  first  function,  the  second 
function  is  best  viewed  as  a  discriminator  between  groups  one  and  two. 
The  position  of  the  group  three  mean  between  those  of  groups  one  and 
two  makes  any  interpretation  of  this  dimension  which  includes  group 
three  difficult  at  best.   The  conceptual  simplicity  obtained  by  viewing 
dimension  one  as  separating  the  Ph.D.  group  from  the  other  two  and 
dimension  two  as  separating  the  master's  from  the  dropout  group  also 
argues  strongly  for  adopting  this  schemata.   When  this  is  done  the 
magnitudes  of  the  standardized  discriminant  weights  show  that  the 
Astin  index  is  by  far  the  primary  variable  involved  in  this  dimension, 
with  GRE  quantitative  being  roughly  half  as  important  but  in  the 
opposite  direction,  and  the  rest  occupying  a  background  role.   Thus 
someone  scoring  high  on  dimension  two  (e.g.  the  master's  group  mean) 
would  tend  to  have  a  very  high  Astin  index  and  a  relatively  low  GRE 
quantitative  score.   Theoretically,  this  runs  counter  to  expectations. 
It  would  be  expected  that  the  master's  group  would  have  a  higher  mean 
score  on  both  the  Astin  index  and  the  GRE  quantitative  variables  than 
the  dropout  group. 

At  this  point  it  may  be  instructive  to  look  at  the  structure 
matrix  in  an  effort  to  gain  further  insight  into  the  meaning  of  the 
discriminant  dimensions  and  perhaps  explain  the  apparently  contrary 


-14- 


result  found  above.   The  structure  matrix  is  the  p  x  (K-l)  matrix 
of  correlations  between  the  p  original  variables  and  the  K-l 
discriminant  functions.   The  coefficients  are  given  in  Table  5. 

Table  5 
Discriminant  analysis  structure  matrix 

Function  1 


Astin  index  -.4313 

letters  of  recommendation  -.4969 

GRE  verbal  .3056 

GRE  quantitative  -.5729 

undergraduate  major  .4648 

verbal-foreign  interaction  -.3822 


Function  2 

.7336 
.2449 
.4505 
.0583 
.1765 
-.2398 


These  correlations  corroborate  the  interpretation  given  to  the 
first  discriminant  dimension.   Although  the  numbers  have  changed  slightly, 
they  are  still  within  a  factor  of  two  of  each  other  with  the  same  signs, 
and  thus  the  original  variables  can  still  be  considered  to  be  of  roughly 
equal  importance  to  the  first  discriminant  dimension. 

Utilizing  the  correlations  from  the  structure  matrix  fortunately 
results  in  a  different  and  more  harmonious  interpretation  of  the  second 
discriminant  dimension.   While  the  Astin  index  still  rates  as  the  largest 
contributor  to  this  dimension,  the  effect  of  quantitative  GRE  scores  is 
seen  to  be  eliminated,  with  the  verbal  GRE  variable  taking  its  place. 
The  phenomenon  whereby  it  is  possible  for  a  variable  to  have  a  high 
negative  loading  on  a  discriminant  function  but  a  low  correlation 
with  the  function  is  analagous  to  the  situation  in  multiple  regression 


-15- 


where  the  standardized  regression  weight  is  large  in  absolute  value 
but  the  zero-order  correlation  with  the  dependent  variable  is  small. 
Both  situations  arise  from  the  nature  of  the  intercorrelations  of  the 
independent  variables.   Discriminant  function  weights  (or  regression 
weights)  control  for  the  intercorrelations.   The  correlations  from  the 
structure  matrix  are  zero-order  correlations,  with  the  resultant 
possibility  of  suppression  or  spuriousness. 

Thus  the  profile  of  an  individual  scoring  high  on  the  second 
discriminant  function  is  one  of  a  high  Astin  index  score  and  a  relatively 
high  GRE  verbal  score.   This  does  not  contradict  any  theoretical 
expectations,  but  does  confirm  what  one  might  guess  from  a  glimpse 
of  the  group  means  listed  in  Table  6  and  a  reference  back  to  the 
percentage  of  foreign  students  in  the  two  groups. 

Table  6 
Group  means  on  independent  variables 


Independent        Dropout  group 

Master's  group 

Ph.D.  group 

Total 

variables  used 

letters  of  reference 

2.7509 

2.9107 

3.1548 

2.9394 

undergraduate  major 

0.7273 

0.8214 

0.4348 

0.6712 

GRE  verbal 

56.9091 

69.9643 

53.1304 

60.7260 

GRE  quantitative 

89.0000 

90.0714 

96.6956 

91.8356 

Astin  index 

4.0673 

5.1986 

5.3696 

4.9115 

verbal-foreign 

4.1818 

1.6429 

8.0000 

4.4110 

interaction 

Variables  not 

used 

undergraduate  GPA 

4.3495 

4.4075 

4.5735 

4.4423 

highest  degree 

0.3182 

0.0714 

0.0870 

0.1507 

CS  course  GPA 

4.7127 

4.6279 

4.7126 

4.6801 

//  of  CS  courses 

4.5000 

3.7857 

5.2174 

4.4521 

-16- 


Once  again,  the  effect  of  the  verbal  GRE  variable  is  largely  due  to 
its  interaction  with  the  foreign  student  classification  variable.   It 
is  therefore  tempting  to  simplify  the  interpretation  of  dimension  two 
even  further  and  claim  that  the  major  difference  between  the  master's 
group  and  the  dropout  group  is  largely  one  of  a  difference  in  the 
competitiveness  of  the  undergraduate  institutions  attended  by  members  " 
of  the  two  groups. 

Turning  from  the  area  of  interpretation  to  that  of  prediction 
leads  to  the  final  results  obtained  from  the  discrminant  analysis,  the 
classification  functions  and  predictions. 

The  classification  functions  obtained  from  the  statistical 
software  package  SPSS  (Statistical  Package  for  the  Social  Sciences)  are 
derived  from  the  pooled  within-groups  covariance  matrix  and  the  centroids 
for  the  discriminating  variables.   There  is  one  function  for  each  group. 
Scores  on  the  three  functions  are  obtained  by  multiplying  the  raw  scores 
on  each  of  the  variables  by  the  appropriate  coefficient  and  adding  in 
the  constant.   A  case  would  be  assigned  to  that  group  for  which  its 
classification  function  score  is  the  highest.   Table  7  lists  the  coefficients 
The  results  obtained  by  applying  the  classification  functions  to  the 
original  sample  are  given  in  Table  8. 

The  table  shows  that  about  two-thirds  of  the  original  sample 
was  correctly  classified  using  the  derived  classification  functions. 
This  seems  to  be  a  reasonable  figure,  and  it  becomes  even  more  reasonable 
when  the  group  breakdowns  are  taken  into  account.   It  would  be  most 
harmful  to  incorrectly  classify  someone  whose  actual  group  membership 


-17- 


Table  7 
Classification  function  coefficients 


letters  of  recommend. 

undergraduate  major 

GRE  verbal 

GRE  quantitative 

As  tin  index 

verbal-foreign 

constant 


Dropout  gp. 

10.28953 
5.19958 

-0.00784 
0.97513 

-1.43927 

0.10469 

-56.50551 


Master's  gp, 

11.05553 
5.62126 
0.00732 
0.91136 
-0.55646 
0.10248 
-58.33578 


Ph.D.  group 

11.81329 
4.35394 

-0.02119 
1.00213 

-0.64228 

0.16021 

-66.38499 


Table  8 

Predicted 

Group  Memb< 

ership 

Actual  Group 

N 

Group  1 

Group  2 

Group  3 

1  -  Dropout 

22 

13 

5 

4 

59.1% 

22.7% 

18.2% 

2  -  Master's 

28 

6 

16 

6 

21.4% 

57.1% 

21.4% 

3  -  Ph.D. 

23 

2 

3 

18 

8.7% 

13.0% 

78.3% 

%  of  grouped  cases  correctly  classified:   64.38% 


-18- 


turned  out  to  be  the  Ph.D.  group.   Recognizing  as  many  potential  Ph.D.'s 
as  possible  is  highly  desirable  and,  for  the  given  sample,  this  turned 
out  to  be  exactly  the  case.   The  Ph.D.  group  was  by  far  the  most 
distinguishable,  with  18  out  of  the  23  cases  in  that  group  being 
correctly  classified.  ' 

Strength  of  relationship:   The  next  topic  to  be  discussed 
is  the  strength  of  the  relationship  found  in  the  discrminant  analysis. 
There  is  no  directly  comparable  statistic  in  discriminant  analysis  to 
the  multiple  correlation  coefficient  of  regression  analysis,  but  treating 
discriminant  analysis  as  a  special  case  of  canonical  correlation  allows 
the  computation  of  various  measures  of  strength  of  relationship. 

When  the  K  groups  of  the  discriminant  analysis  are  coded  as 
K-l  dummy  variables  (i.e.,  1  for  group  membership,  0  for  non-members), 
a  canonical  correlation  analysis  can  be  performed,  resulting  in  the 
identification  of  K-l  canonical  variate  pairs  and  K-l  canonical 
correlations.   The  two  canonical  correlations  obtained  in  the  present 
case  were  .548  and  .431.   However,  although  the  correlations  can  be 
considered  to  be  indicators  of  the  strength  of  the  relationship,  the 


This  is  not  surprising,  since  the  discrminant  function  corresponding  to 
the  largest  discrminant  criterion  was  interpreted  as  being  a  discriminator 
between  the  Ph.D.  group  and  the  other  two  groups. 

There  are  no  obvious  generalizations  to  make  regarding  the  five  incorrectly 
classified  Ph.D.  students,  but  individual  data  on  each  are  given  in 
Appendix  7. 


-19- 


interpretation  in  terms  of  explained  variance  is  not  obvious.   A 

squared  canonical  correlation  represents  the  proportion  of  variance  of 

a  linear  combination  of  one  set  of  variables  that  is  predicted  by  a 

linear  combination  of  the  other  set  of  variables.   What  is  desired  is 

not  a  measure  of  the  variance  shared  by  linear  combinations  of  the  two 

sets  of  variables,  but  rather  a  measure  of  the  shared  variance  of  the 

two  sets.   Stewart  and  Love's  (1968)  redundancy  coefficients  provide 

such  a  measure.   These  coefficients  are  asymmetrical  measures  of  the 

strength  of  the  relationship  between  two  sets  of  variables.   There  is 

one  coefficient  for  the  redundancy  of  set  Y  given  set  X,  and  another 

for  the  redundancy  of  set  X  given  set  Y.   In  the  present  case,  there 

is  a  need  for  only  one  of  the  coefficients,  since  the  sets  of  variables 

are  clearly  delineated  into  dependent  and  independent  categories. 

Letting  Y  =  the  set  of  2  dummy  group  variables  and  X  =  the  set  of  6  independe 


2 
variables,  the  coefficient  of  interest  is  R    ,  which  is  the  redundancy 

yx 

of  set  Y  given  set  X  or,  equivalently,  the  proportion  of  variance  in  Y 


2 
predictable  from  X.   Computing  R  >   from  the  formula. 


q   o  q 


?      M    2  M   2 
R^   =   E   u/(  Z   a  ..)/q 

y'X   i=l   X  j=l  yji 

2 
where  u.   =  the  ith  squared  canonical  correlation 

2 
a  . .  =  the  squared  correlation  between  Y  and  the  ith  canonical  variate 
vi  i 
'J    of  the  Y  set 

q  =  2,  the  number  of  variables  in  Y 


-20- 


2 
yields  the  result  that  R  m      =  .2562.   Thus,  approximately  25% 

of  the  variance  in  the  group  classification  is  predictable  from  the 


independent  variables.   The  corresponding  R    is  .506. 

y-x 


Multiple  Regression:   The  final  section  briefly  presents  the 
results  of  the  three  multiple  regression  analyses  where  first  semester 
graduate  GPA,  cumulative  graduate  GPA,  and  general  exam  cumulative 
total  were  the  dependent  variables.   A  contrast  is  then  offered  with 
the  discriminant  results. 

The  regressions  were  also  done  in  a  stepwise  fashion,  so 
it  is  possible  to  get  an  idea  of  the  importance  of  each  of  the 
independent  variables  in  the  separate  predictions.   Table  9  lists 
the  multiple  correlations  obtained  at  each  step  in  the  selection 
process  for  each  of  the  dependent  variables.   (Appendix  6  gives  the 
corresponding  regression  weights.)   Several  conclusions  can  be 
reached  upon  examination  of  Table  9.   The  first  is  that  there  is 
little  to  be  gained  by  including  more  than  two  or  three  predictors 
for  each  dependent  variable.   The  multiple  correlations  level  off 
quite  rapidly.   Second,  the  specific  nature  of  each  dependent  variable 
can  be  seen  from  its  predictors.   The  number  of  times  the  general 
exams  were  taken  is  largely  a  function  of  an  individual's  mathematical 
aptitude  and  his  undergraduate  background  in  computer  science.   First 
semester  grade  point  average  and  cumulative  grade  point  average  are 
virtually  indistinguishable,  the  correlation  between  them  being  .92, 


-21- 


Table  9 
Regression  summaries 


Dependent  variable  =  1st  semester  graduate  GPA 


Variable 


As tin  index 

undergraduate  GPA 

largest  degree 

verbal-foreign 

if   CS  courses 

GRE  quantitative 

GRE  verbal 

undergraduate  major 

CS  GPA 

letters  of  reference 


Multiple  R 

R 

Change  in  R 

simple  r 

.44192 

.19530 

.19530 

.44192 

.55133 

.30397 

.10867 

.27711 

.57205 

.32725 

.02328 

-.01959 

.57934 

.33563 

.00838 

.07668 

.58744 

.34508 

.00946 

.19304 

.58896 

.34687 

.00179 

.35515 

.58991 

.34799 

.00112 

.09403 

.59062 

.34884 

.00084 

-.16153 

.59084 

.34909 

.00025 

.14825 

.59089 

.34915 

.00006 

.11604 

Dependent  variable  =  cumulative  graduate  GPA 


Variable 

Multiple  R 

R 

Change  in  R 

simple  r 

Astin  index 

.39665 

.15733 

.15733 

.39665 

undergraduate  GPA 

.53071 

.28165 

.12432 

.30506 

verbal-foreign 

.54060 

.29225 

.01060 

.06733 

undergraduate  major 

.55209 

.30480 

.01255 

-.22006 

largest  degree 

.55802 

.31138 

.00658 

-.07921 

CS  GPA 

.56284 

.31678 

.00540 

.10366 

GRE  quantitative 

.56343 

.31745 

.00067 

.36888 

#  CS  courses 

.56383 

.31790 

.00045 

.17834 

GRE  verbal 

.56406 

.31817 

.00026 

.09745 

letters  of  reference 

.56416 

.31828 

.00011 

.16847 

Dependent  variable  =  general  exam  cumulative  total 


Variable 

Multiple  R 

R 

Change  in  R 

simple  r 

GRE  quantitative 

.44696 

.19977 

.19977 

-.44696 

#  CS  courses 

.56043 

.31408 

.11431 

-.43701 

Astin  index 

.59140 

.34975 

.03567 

-.37221 

undergraduate  GPA 

.62592 

.39177 

.04202 

-.30076 

CS  GPA 

.64095 

.41082 

.01905 

-.01930 

verbal-foreign 

.65208 

.42521 

- .01439 

-.05166 

undergraduate  major 

.65816 

.43318 

.00797 

.33509 

largest  degree 

.65821 

.43324 

.00006 

.11199 

GRE  verbal 

.65821 

.43324 

.00000 

-.05815 

letters  of  reference 

.65821 

.43324 

.00000 

-.21739 

-22- 


and  both  are  functions  of  undergraduate  grade  point  average  and  the 
quality  of  the  undergraduate  institution.   Thus,  although  grade  point 
average  and  number  of  times  the  general  exams  were  taken  are  achievement- 
related  variables,  the  achievements  they  are  related  to  are  rather 
narrow,  specific  criteria.   This  contrasts  rather  sharply  with  the 
notion  of  treating  the  end  product  of  graduate  school  as  the  dependent 
variable,  as  was  done  in  the  discriminant  analysis.   When  the 
classification  into  Ph.D.,  master's,  and  dropout  groups  was  made, 
the  concepts  of  achievement  and  success  become  much  more  general  and 
all-encompassing.   Thus,  although  the  relationship  between  the  group 
classification  scheme  and  grade  point  average  is  apparent,  the  groups 
represent  more  than  grade  point  average;  they  represent  graduate 
school  achievement  in  a  more  general  yet  meaningful  sense  of  the 
term. 

Comparison  of  the  redundancy  coefficient  given  earlier  with 
the  multiple  correlations  obtained  in  the  regression  analyses  shows 
that  the  predictive  power  is  not  as  great  when  the  dependent  variables 
consist  of  the  discriminant  groups.   This  is  explainable  in  terms  of 
the  distinction  made  above  of  the  general  versus  the  specific  nature 
of  the  various  dependent  variables.   Specific,  achievement-related 
variables  such  as  graduate  grade  point  average  and  the  number  of  times 
the  general  exams  were  taken  are  bound  to  be  more  predictable  from 
other  specific,  achievement-related  variables  than  is  the  more 
general,  all-inclusive  group  categorization  variable.   Put  another  way, 


-23- 


the  contention  here  is  that  although  psychological  variables  such  as 
motivation  and  perseverance  play  a  significant  role  in  affecting 
variables  like  grade  point  average,  they  play  an  even  larger  role  in 
determining  the  ultimate  outcome  of  one's  graduate  school  career.   Earning 
a  Ph.D.  takes  at  least  four  years  of  hard  work,  and  without  the  motivation 
to  stick  it  out,  an  individual's  talents  are  not  utilized. 

A  second  factor  tending  to  decrease  the  predictive  power  in 
the  discriminant  analysis  is  the  simple  fact  that  degree  success  is  farther 
in  time  from  the  measurement  of  the  independent  variables  than  is  the 
case  with  the  dependent  variables  in  the  multiple  regressions.   But 
even  though  there  is  a  slight  loss  in  predictability  in  the  discriminant 
group  case,  this  is  offset  by  the  fact  that  a  better  measure  of  success 
is  being  obtained.   Overall,  it  is  more  desirable  to  have  a  good  prediction 
of  a  good  criterion  variable  (degree  success)  than  a  better  prediction 
of.  a  poorer  criterion  (GPA)  . 

SUMMARY  AND  RECOMMENDATIONS: 

This  paper  has  employed  the  multivariate  statistical  technique 
of  discriminant  analysis  in  an  effort  to  uncover  the  predictors  of  a 
successful  graduate  school  career.   The  sample  of  students  was  divided 
into  dropout,  master's,  and  Ph.D.  groups.   The  variables  chosen  as 
discriminating  best  among  the  groups  were,  in  order  of  selection, 
the  Astin  selectivity  index,  letters  of  reference,  GRE  verbal,  GRE 
quantitative,  undergraduate  major,  and  verbal  GRE-foreign  student 


•24- 


interaction.   Two  discriminant  functions  were  obtained  from  the 
analysis.   The  first  and  more  important  was  interpreted  as  distinguishing 
the  Ph.D.  group  from  the  other  two  groups.   Each  of  the  six  independent 
variables  was  of  roughly  equal  importance  on  this  dimension.   The  second 
function  was  best  thought  of  as  a  discriminator  between  the  master's  and 
the  dropout  groups.   The  Astin  index  was  the  primary  determinant  of  this 
dimension,  with  GRE  verbal  roughly  half  as  important.   The  strength  of 
the  relationship  between  the  group  criterion  and  the  independent  variables 
corresponded  to  a  multiple  correlation  of  .506.   This  was  less  than  was 
obtained  when  predicting  say,  graduate  GPA,  but  the  closer  correspondence 
to  the  concept  of  graduate  school  success  was  thought  to  offset  the 
loss  in  predictability. 

The  study  has  been  moderately  successful,  given  the  small 
sample  upon  which  the  analysis  was  based.   But  regardless  of  the 
strength  of  the  present  results,  the  method  seems  sound,  and  the 
following  recommendations  appear  warranted: 

1.   Since  GRE's  are  not  required  from  MCS  and  MSTCS  applicants, 
process  these  applications  separately.   For  all  other  applicants,  require 
and  record  data  on  all  variables.   There  are  enough  good  candidates 
applying  that  it  shouldn't  be  necessary  to  admit  anyone  unwilling  to  take 
the  GRE's  or  secure  letters  of  reference.   Accurate  and  uniform 
classification  cannot  be  accomplished  if  data  is  missing.   With  regard 
to  GRE  scores,  record  the  scaled  scores  as  well  as  the  corresponding 
percentiles.   The  scaled  scores  are  comparable  over  time  whereas 


-25- 


the  percentiles  technically  are  not,  since  they  are  based  on  that 
particular  test  administration  only. 

2.  Use  the  classification  functions  of  the  discriminant 
analysis  as  an  assessment  tool  in  judging  the  applicants.   Scores  on 
these  functions  indicate  the  degree  of  similarity  between  an  individual 
and  the  three  group  centroids.   Individuals  scoring  highest  on  the  Ph.D.- 
group  function  and  next  highest  on  the  master 's-group  function  are 
reasonably  good  bets  to  be  good  students.   Although  it  might  seem 
desirable  to  obtain  a  single  predicted  score  for  each  applicant  on 

a  scale  of,  say,  one  to  five,  this  would  involve  transforming  the 
group  criterion  into  a  single  continuous,  interval-scale  variable, 
which  is  statistically  inappropriate. 

The  only  questionable  aspect  in  using  the  classification 
functions  is  that  they  tend  to  favor  foreign  students.   The  high 
percentage  of  foreign  students  in  the  Ph.D.  group  is  manifested  in 
the  effects  of  the  GRE  verbal  and  verbal-foreign  interaction  variables. 
Elimination  of  the  foreign  student  bias  from  the  functions  could 
be  achieved  by  either  computing  new  functions  which  exclude  the 
GRE  verbal  and  verbal-foreign  variables,  or  by  performing  separate 
analyses  for  foreign  and  non-foreign  groups,  as  discussed  below. 

3.  Replicate  the  study  at  a  future  date.   Two  replications 
are  called  for.   The  first  is  a  cross-validation  of  the  results  presented 
here.   This  could  be  done  in  a  year  or  two  when  the  entering  classes 

of  1976  and  1977  have  progressed  to  the  point  where  their  discriminant 


-26- 


group  membership  has  become  known.   Cross-validation  involves  applying 
the  original  predictive  equations  to  a  new  sample  and  comparing  the 
shrinkage  in  predictive  power  to  that  expected  by  chance.   Since  the 
equations  were  optimized  for  the  original  sample,  a  loss  in  prediction 
will  obviously  occur. 

The  second  and  more  important  replication  would  involve  merging 
the  two  samples  into  a  single  sample  of  reasonable  size  and  redoing  the 
entire  analysis.   The  major  shortcoming  of  this  study  was  the  small 
sample  size  upon  which  it  was  based.   The  results,  although  reasonably 
convincing,  are  to  some  extent  the  result  of  sampling  error.   Doubling 
or  tripling  the  sample  size  would  allow  one  to  place  far  greater 
confidence  in  the  reliability  of  the  various  statistics  and  coefficients. 
With  the  passage  of  time,  it  would  also  be  possible  to  use  the  actual 
achievement  of  a  Ph.D.  as  the  basis  for  inclusion  in  the  Ph.D.  group. 

If  the  sample  size  were  sufficiently  large,  an  additional 
recommendation  would  be  to  conduct  separate  analyses  for  foreign  and 
domestic  students.   To  the  extent  that  different  psychological  factors 
exist  within  the  two  groups  with  respect  to  attitudes  toward  graduate 
school,  separate  analyses  would  tend  to  increase  predictive  power 
within  each  group  by  minimizing  the  effects  of  unmeasured,  psychological 
variables. 


-27- 


References 


Astin,  Alexander  W.  ,  Predicting  Academic  Performance  in  College,  The  Free 
Press,  New  York,  1971. 

Nie,  Hull,  Jenkins,  Steinbrenner,  and  Brent,  SPSS,  Statistical  Package  for 
the  Social  Sciences,  2nd  Edition,  McGraw-Hill,  Inc.,  New  York,  1975. 

Stewart,  D.  and  Love,  W. ,  A  general  canonical  index,  Psychological 
Bulletin,  1968,  JO,    160-163. 

Tatsuoka,  Maurice  M. ,  Multivariate  Analysis;   Techniques  for  Education  and 
Psychological  Research,  John  Wiley  &  Sons,  Inc.,  New  York,  1971. 


-28- 


f  £  C  *:  C  jr  -P  .p-  UlU)  U> U) L>  Ui  UiUJOjUj NJ K->k>NJ  roNJM  N)KJK> -»_»-»  -±-k  -*-» 


^O  00  -«J  (T>  Ul  *?  U>  M  -» 


OjLct\Jf^N.UJLi-fO^_»c~ N-  !^L^OJ^<^r^NJ^L^M^^^UjCoM^^NJUJUJUJU-UiK.'U>(_iN,toK.:l-JN'M^>KJUl 


ooo^uic^c  orouiU)Owuo>ooJ,w<M>Juioouou.C'Oc^wo.-)Uionou)Uj^ico«Ju^oo 

OOsJUvlvjOvlOOuOuiUvJOOvlW^UUOOUOUsjOsJUOUOOOOu^^O^UisJOO 


f:*:^rjr  .C.CP-.C? 


C^cwPC'CtUiC'CCCCU.'t*U)£CU)t:tJ:e  -P-P  .P P  *:*r  -Pfr  .p  .p  Ui.e  .Cp 


-»OMOO-'0-'0-»(0(00-J-»0 


_»_ >oNJ-»0-*-» 


oo 


ooo- 


35 

> 

Li 

o 

33 


33 
Si 

«s 

S3 
3] 

B 


en 

33 
SI 

o 

a 
> 


n 

en 


n 

n 
o 

a 
» 


10 
W 
3 


a 


M 

H 

» 

o 

a 

3 


CO 
H 


| 
^oaj^i-Jcxivcvcuit^o^uivro^kC-PO-vOLrico^j'a-ui     -»K;u>a^JU1-Pvc    ,uio>cecvceceuio"Oo»j-»jnju>-»-» 


^CCttUD^^v£>CD^O&v£vOvOO^^dvCCDiOC3v£>^C3vCP(X;OC\CO?^a3^VC'ao^vCm 


UlPPP'PUlPPPUlUJpPUlpPPUlPLnluiPppluipl^tPUlUIPPPPUIUlUl^ 


OMO^OUlOCDUUjOOO>CDOa>aOoaOOUlvJOvOO\0(J1000UJ^lW'OOOOUJU100UiO,JO 
ClOvlOOOUJli|wOO>|OOUHOOOOOOUIOOOOS00Ov|->l300000UpUOUO>JO 


<r  LTI  Ui  *  0s  Ui  C*  O.  i  -J  KJ  _»  frl  UI  _»  CT*  CT 


«J.a  (J1  (J  to  LP  *r O  P  LTIK)  -JhO  U>  U)  C3  »J  OU1  -*0>  *J  *r  UP  -»N)  (JcT  OO 


pPLnPPUlU1*pP(JjU1PpPUlpPPUPPPPPU;UJU1MPU'UUlMMplPUJNjPUlMUlp!pUlKJ 


uioouisioouiouioooo^»ja<jtna^owooouiffiouoouioooiou>Joooooowoo\ 

OOOOK.OOOOUOOONlU10UlOs|vJOuOQOO«JOUOOwOOOU|UinOqOOOOUOvJ 


CCtCCCLncx:Ct-CP.C*ijltC*Ujef:tCCUJCaitO.ClJ1UIJllvhJCi:uihjUtK)£lflUjLnW 


o^^cocr^a)OOxLnuiMUi«oo^o^^03^o^uio*roJ4=-^o(jwoo-^ovDOMLncxiO'»J~JO-jao5  0a\ 

aC-^M0PP00OUlOUip<-nP  0-^OOVO«JvDN;*CUlM^OMOUiOO'«10-»ON)iO(^OU1UlOyqtO^ 


OQ3»J»Ja^V£>PO*>a'>«JUD~JUjVCPCno>v^OvJ  -^0«J»JUI*:tNJ0^4^f  0  0»*r-«JO>03COO>OM0030>i^»JUi^ON 


(^Ln^^UlLTiUi^CTUlM^UIUl^U^^C^Ul^KjU^^^UJtCnO^XrLnO^IJitOKJ-O-P  PUHjil/lOvPLPCoPO 


OOOOOOOOOOOOOOOOoOOOOOKWNJOOOOOOOOOOOOOOOOOOsHOaff 
OOOOOOOOOOOQOOOOOOOOOoOW->OOQOOOOOOOOOOOOOOOsl-i«JU 


-  >q- 


29- 


U)tOU.'UlU>  -PMOjU/NJ  «:UJU'NJU.'L>NJKJ  UjUiM-CNJUJUJKJ 


ff(rOOOOCvUiU)010(^UnOOP(«3(MuOOOOsCMJO> 


vJvIO 


ooo-»j 


>*juJvio-viU)Co^-eviuiQoo>j^juj»j 

i 


CtuN;MOOvjOU(tf  0»-J^OsOUlUJC.««JOlna  UiUlLn— »-J 


oo°ooooo-*oo— »o 


*o- 


»0-*W 


VC-J-«J  OC  .C  .^CT.^O  -~JiXi  Ot>OD  UJU3  -J^U)  «sl  v£VO  U1QDCC  -C  VO-* 


OOOaOO^OuCDO*:0>ltOai>JOvflOWsJOO-' 
OOOv|OOvjO(COOOO.ipU>J->UOOaiOOOC 


ovuitocr -«j  «jui«j  oot-n  *^o  tn«j  uIcdui  ««JCjoUioo-J|««JK)-» 


Ul  js  *r  .p  (ji  on  Ul  ui  m  m  U1  Ul  <_n  <=  Ul  P  -Cr  Ul  ul  P  P  Ul  Ul  Ul  Ul  P 


OUiOUiOOQOoOoOO^JOWOOOsJvJOQOO* 
CXJOOOOOOOOOOOUlQsJOOOWfflOCOOO 


frfrC^f:  *r*:*r*r  jr  4=  Ijl *:*:*:<=•& -P Ol •ejrUlU- *r*r*r 


10 -jpcr  ~j  cd  co^o  Pa:  vooccc^caaiujODO^o 

U>UlMUIOMPPPpPO-»>*JUlaDOtsjOUiP 


voodsj 


OQvO 


opo3Ce,onpct*puic^oacco>'-j  o-*paiuiui<.JouiPUi^ove 


1 
w 
n 


1 


o 


3 


Ui 
a 


n 

■a 
yi 

< 

9 


o 

3 

8 


n 

a 


n 

n 
o 

a 
to 

(A 


W 

w 

3 

o 

> 


>oo 


ukjoi  mui  cno*  c*  men  Lrt^J  mui  pmu)  en  uiui  u-ukjiui  «J«J 

OOOOOOOOOOOOOO«JW*0JOluWWOOO 
OOQOOOQOOOOOOO  «JO*  ff>  ffUlUWl^OOO 


JOOOOOOOO  »«)< 


o 

a 

3 

a 

> 


w 

X 

> 

DC 

n 

a 

a 


LO 


M 


-30- 


APPENDIX  2 
Astin  Index  Rankings  of  Undergraduate  Colleges  Attended 
by  the  Study's  Sample  of  Students 

Ranking  College 

7  Colgate 

7  Princeton 

7  Calif.  Inst,  of  Tech. 

7  Univ.  of  Rochester 

7  Davidson 

7  Harvard 

7  Univ.  of  Washington 

7  MIT 

7  Trinity  College  (Conn.) 

6  Whitman  College  (Wash.) 

6  Clarkson  College  of  Tech.  (N.Y.) 

6  Univ.  of  Michigan 

6  Univ.  of  California  -  Berkeley 

6  Carnegie-Mellon 

6  Northwestern 

6  St.  Olaf  College  (Minn.) 

5  Augustana  (111.) 

5  Univ.  of  Texas,  Austin 

5  Villanova 

5  Univ.  of  Illinois,  Urbana 

5  Queens  College  (N.Y.) 

5  Northeastern  Univ.  (Mass.) 

5  Rockford  College 

5  Univ.  of  Connecticut 

5  Carroll  College  (Wise.) 

5  Univ.  of  Kansas 

4  Western  Illinois  Univ. 

4  Baylor 

4  Loyola  Univ.  (Calif.) 

4  Univ.  of  Missouri,  Columbia 

4  Marietta  College  (Ohio) 

4  Millikin 

4  Indiana 

4  Birmingham  Southern  College  (Ala.) 

3  Southern  Illinois  Univ. 

3  Cleveland  State 

3  DePaul 

3  Univ.  of  New  Mexico 

3  Eastern  Illinois  Univ. 

2  Univ.  of  Illinois,  Chicago  Circle 

2  Univ.  of  South  Carolina,  Lexington 

2  N.  Y.  Inst,  of  Tech. 

2  Univ.  of  Missouri,  Rolla 


-31- 


APPENDIX  3 


if 


S3- 
P. 


rt 
IB 
H 

rt 

f 


n> 

eg 


Q 


Q 


CO 
CD 
CO 


a 


§ 


s- 


8 


I 

8. 


I 


rt       H  rt 

g.    •  . 

Hi 


i 


-«J  o 

CO  CO 
^00 


OH-1 

^-^  ON 

00 


ON 


Ul 


■^J I-4 

ow 


O  Cn 


rt 

8 

CO 


I 

/^  • 

CO  N> 


I 

Ul  O 


Ul 
Ul 


o> 
o 


Ul  CO 
^1  ON 


Ul  o 
00 


^J  o 
CO  o 


K 


I 

x-s  • 

Ul  (-• 

^J  ON 

^oo 

ON 


Ul  M 


CO  00 
CO 


CO  O 


ON  O 
00  ^J 


-J  o 

CO  On 


I 

/"-\  • 

CO  ON 

VO 


CO  N> 

n-'  CO 

N> 


CO  Ul 


£       8 


~-J  o 

CO  Ul 
^CO 


Co  -P- 

VO 


I 
/~\  •  /-s  • 

CO  -^J         CO  h-« 

^-^  ON         "'-''Ul 

00  00 


<*!£ 


00 


I 

CO  ON 
v-x  v© 


CO  H-« 
v-/  (so 


CO  00 

N-'  CO 
Ul 


I 


U>  Ul         WN3 


^vl  CO 

CO  «-J 
CO 


o 
o 


o 
o 
o 
o 


8 


O   Hi        o 
Hi  Ml         Hi 

O   O  H 


co  rt       13 


-«j  CO 
co  on 


-J  o 

CO  £» 
VO 


ON  O 
00  .E> 

ON 


CO  <-J 
O 


O 
O 


^1  O 

CO-vJ 


CO 


g 


ON  O 
00  Ul 

woo 

CO 


8 


CO  ^ 


s 


ONO 

00  I-1 

*~s  ON 

ON 


8 


I 

^1  o 

CO  -sj 


o 
o 


-32- 


APPENDIX  4 


R  V 

US 
Bf 


rt 

rt 

R 

CO 

o 

l-h 

n 

ft 


rt 

CO 


I 

> 


1 

1 

1 

H» 

o 

ON 

^4 

vO 

lo 
o 

g 

ON 

LO 
Ln 
Ln 

o 

VO 

s 

ON 

ho 

M 

o 

M 
VO 

ON 

4> 

O 

CO 

I-1 
I-4 

ON 

VO 

ON 
ON 

ho 

vo 

8 

o 

LO 

lo 

on 

00 

LO 

LO 

CO 

LO 

LO 

Ln 

o 

LO 

LO 

fc1 

rt 

CO 


S. 


8 


1 

1 

l 

I-1 

o 

ON 

LO 

I-1 

oo 

LO 

O 
Ln 

LO 
ON 
00 
vO 

s 

4> 

N3 

NO 
O 
f-1 

LO 
O 
Ln 

O 

vO 

N3 

LO 
Ln 
00 

I-1 

ON 

00 
vo 

LO 
O 
Ln 

8 

o 
o 

LO 

"^1 

LO 

ON 

00 

LO 

LO 

LO 

LO 

LO 

Ln 

O 

LO 

8 


I 

1 

1 

1 

1 

1 

1 

M 

o 

Ln 
M 

o 

O 

o 

O 
Ln 
00 
M 

LO 
LO 
Ln 

LO 
00 

o 

N3 

Ln 
LO 

00 

N3 
I-1 

ON 

8 
8 

LO 

LO 

ON 

oo 

LO 

LO 

LO 

LO 

LO 

Ln 

O 

-33- 


APPENDIX  5 
Standard  Deviations  of  Independent  Variables 


Variables  used 

Group  1 
Dropout 

Group  2 

Master ' s 

Group  3 
Ph.D. 

Total 

Letters  of  reference 

.4686 

.5990 

.5127 

.5522 

Major 

.5505 

.7228 

.5069 

.6248 

G3E  verbal 

27.7591 

24.9377 

33.3662 

29.2268 

ORE  quantitative 

13.7840 

10.8729 

2.6019 

10.6318 

Astin  index 

1.4045 

1.6333 

.6858 

1.4270 

Verbal-foreign 
interaction 

9.0060 

5.9330 

16.8226 

11.4210 

Variables  not 
used 

Highest  prior 
degree 

.4767 

.2623 

.2881 

.3602 

Undergraduate  GPA 

.4475 

.4240 

.2930 

.4010 

CS  course  GPA 

.3367 

.4565 

.3621 

.3913 

#  of  CS  courses 

2.8410 

2.0791 

2.5218 

2.5058 

-34- 


APPENDIX  6 


Regression  coefficients 
Dependent  variable  =  1st  semester  graduate  GPA 


Unstandardized  regression  coefficients 


Variables 

in 

eq. 

1 

2 

3 

4 

Constant 

3.0985 

.0295 

-.5857 

-.5484 

Astin  index 

.2544 

.2762 

.2943 

.2986 

undergrad  GPA 

.6801 

.7728 

.7544 

largest  degree 

.3676 

.3249 

verbal- foreign 

.0068 

Standardized 

regression  coefficients 

Variables 

in 

eq. 

1 

2 

3 

4 

Astin  index 

.4419 

.4798 

.5112 

.5187 

undergrad  GPA 

.3318 

.3771 

.3681 

largest  degree 

.1612 

.1425 

verbal-foreign 

.0940 

Dependent  variable  =  cumulative  graduate  GPA 

unstandardized  regression  coefficients 


Variables 

in 

eq. 

1 

2 

3 

4 

Constant 
Astin  index 
undergrad  GPA 
verbal - foreign 
undergrad  major 

3.3721 
.2075 

.3316 
.2287 
.6610 

.3043 
.2351 
.6533 
.0068 

.8218 
.2336 
.5594 
.0084 
-.1490 

Standardized 

regression  coefficients 

Variables 

in 

eq. 

1 

2 

3 

4 

Astin  index 
undergrad  GPA 
verbal- foreign 
undergrad  major 

.3966 

.4371 
.3549 

.4494 
.3508 
.1038 

.4465 

.3004 

.1282 

-  .1247 

Dependent  variable  =  general  exam  cumulative  total 


Variables  in  eq. 

Constant 

GRE  quantitative 
#  CS  courses 
Astin  index 
undergrad  GPA 


Variables  in  eq. 

GRE  quantitative 
#  CS  courses 
Astin  index 
undergrad  GPA 


Unstandardized  regression  coefficients 

1        2       3      4 

22.1306   21.7158   20.5155  27.5594 

-.1503    -.1217   -.0761   -.0499 

-.4974   -.5319   -.4783 

-.5770   -.7540 

-1.9846 

Standardized  regression  coefficients 

1        2       3      4 


-.4470 


-.3618 
-.3487 


-35- 


-.2262 
-.3728 
-.2303 


-.1484 
-.3352 
-.3010 
-.2225 


APPENDIX  7 


Ln 


lo      to 


fi> 


Ln 


LO       to 


n> 


o 


ho 

Q 

»-» 

h-» 

Ln 

Cn 

I> 

Ln 

VO 

VO 

H-» 

Ln 

l 

00 

Ln 

CO 

o 

Ln 

vO 

-P- 

to 

00 

to 

ON 

vO 

O 

vO 

IO 

-r> 

Ln 

LO 

l-» 

to 

score    Grp . 
P 

to 
1 

»-» 

h-» 
1 

to 

1 

tO 

1 

with 
robab 

Ln 

Ln 

•P* 

LO 

Ln 

H"  _ 

LO 

s 

t-1 

^J 

^J 

M  D^ 

■f> 

to 

^J 

00 

ighes 
ity 

rt 

s 


CO 


CO 
LO 


O 
O 


tO        Si 


CO 


On 
-J 


Ln        Ln 


Ln 


Ln 


5 


tO 


O 
O 


LO 
Ln 


LO 

ON 


O 
O 


to 


°5 

vO 


o 
to 


*v4 
00 


VO 
00 


VO 
^1 


VO 
Ln 


vO 
vO 


vO 
vO 


vO 
00 


S 

rt 
rt 

I 

CO 

o 

Hi 

3 

Mi 


Br 

rt 


8 


g 

S 


Ln 


o 
rt 

r 

CO 

IT 


CO 


rt 

CO 


x>  s; 

H   H- 

1 

LO 
1 

LO 
1 

LO 
1 

LO 

1 

th  n 
obab 

K> 

to 

LO 

LO 

LO 

R& 

ON 

^J 

LO 

ON 

^vj 

O 

■P- 

VO 

l-» 

•P- 

highest 
ty 

to 


-36- 


IOGRAPHIC  DATA 
!T 


1.   Report  No. 

UIUCDCS-R-78-925 


tie  and  Subtitle 


PREDICTING  GRADUATE  SCHOOL  SUCCESS 


3.  Recipient's  Accession  No. 


5-  Report  Date 

June   1978 


6. 


ithor(s) 

Richard  N.  Nelson 


8.   Performing  Organization  Rept. 
No. 


•rforming  Organization  Name  and  Address 

Department  of  Computer  Science 

University  of  Illinois  at  Urbana-Champaign 

Urbana,  Illinois   61801 


10.  Project/Task/Work  Unit  No. 


11.  Contract/Grant  No. 


Ipon  soring  Organization  Name  and  Address 

Department  of  Computer  Science 

University  of  Illinois  at  Urbana-Champaign 

Urbana,  Illinois   61801 


13.  Type  of  Report  &  Period 
Covered 


14. 


MCS  Report 


iupplementary  Notes 


Abstracts 

An  attempt  was  made  to  identify  the  factors  affecting  graduate  school  success  in 
nputer  science.   The  six  most  important  stepwise-selected  variables  were  the  Astin 
Lectivity  index  of  undergraduate  colleges,  letters  of  reference,  GRE  verbal,  GRE 
intitative,  undergraduate  major,  and  verbal  GRE-foreign  student  interaction.   There 
re  two  dimensions  along  which  the  Ph.D.,  master's  and  dropout  groups  differed.   The 
rst  separated  the  Ph.D.  from  the  other  two  groups.   All  variables  were  of  equal 
jortance  on  this  dimension.   The  second  separated  the  master's  from  the  dropout  group, 
;  Astin  index  was  its  primary  determinant.   The  redundancy  coefficient  of  .504  was 
js  than  the  multiple  correlation  obtained  when  GPA  was  the  dependent  variable. 


Cey  Words  and  Document  Analysis.     17a.   Descriptors 

computer   science 

graduate   school  admissions 

Identifiers/Open-Ended  Terms 

- 

COSATI  Field/Group 

vailability  Statement 

19.  Security  Class  (This 
Report) 

UNCLASSIFIED 

21-  No.  of  Pages 

38 

20.  Security  Class  (This 
Page 

UNCLASSIFIED 

22.  Price 

1  NTIS-3S  (10-70) 


USCOMM-DC   40329-P71 


L 


SO  WN 


